AITopics | similarity value

Collaborating Authors

similarity value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions

Neural Information Processing SystemsFeb-12-2026, 03:41:27 GMT

AI/ML problems, and in each such case σ is assumed to be known .

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.68)

Industry:

Banking & Finance (0.93)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

A method for outlier detection based on cluster analysis and visual expert criteria

Lara, Juan A., Lizcano, David, Rampérez, Víctor, Soriano, Javier

arXiv.org Artificial IntelligenceOct-28-2025

Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier detection, often as a preliminary step in order to filter out outliers and build more representative models. In this paper, we propose an outlier detection method based on a clustering process. The aim behind the proposal outlined in this paper is to overcome the specificity of many existing outlier detection techniques that fail to take into account the inherent dispersion of domain objects. The outlier detection method is based on four criteria designed to represent how human beings (experts in each domain) visually identify outliers within a set of objects after analysing the clusters. This has an advantage over other clustering-based outlier detection techniques that are founded on a purely numerical analysis of clusters. Our proposal has been evaluated, with satisfactory results, on data (particularly time series) from two different domains: stabilometry, a branch of medicine studying balance-related functions in human beings and electroencephalography (EEG), a neurological exploration used to diagnose nervous system disorders. To validate the proposed method, we studied method outlier detection and efficiency in terms of runtime. The results of regression analyses confirm that our proposal is useful for detecting outlier data in different domains, with a false positive rate of less than 2% and a reliability greater than 99%.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1111/exsy.12473

2510.23136

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (0.87)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (0.88)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Evaluating NLP Embedding Models for Handling Science-Specific Symbolic Expressions in Student Texts

Bleckmann, Tom, Tschisgale, Paul

arXiv.org Artificial IntelligenceOct-23-2025

In recent years, natural language processing (NLP) has become integral to educational data mining, particularly in the analysis of student-generated language products. For research and assessment purposes, so-called embedding models are typically employed to generate numeric representations of text that capture its semantic content for use in subsequent quantitative analyses. Y et when it comes to science-related language, symbolic expressions such as equations and formulas introduce challenges that current embedding models struggle to address. Existing research studies and practical applications often either overlook these challenges or remove symbolic expressions altogether, potentially leading to biased research findings and diminished performance of practical applications. This study therefore explores how contemporary embedding models differ in their capability to process and interpret science-related symbolic expressions. To this end, various embedding models are evaluated using physics-specific symbolic expressions drawn from authentic student responses, with performance assessed via two approaches: 1) similarity-based analyses and 2) integration into a machine learning pipeline. Our findings reveal significant differences in model performance, with OpenAI's GPT-text-embedding-3-large outperforming all other examined models, though its advantage over other models was moderate rather than decisive. Overall, this study underscores the importance for educational data mining researchers and practitioners of carefully selecting NLP embedding models when working with science-related language products that include symbolic expressions. The code and (partial) data are available at https: //doi.org/10.17605/OSF.IO/6XQVG.

data mining, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.1795

Country:

Asia (0.93)
Europe > Germany (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Education > Curriculum > Subject-Specific Education (0.47)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions

Neural Information Processing SystemsOct-8-2025, 18:01:25 GMT

AI/ML problems, and in each such case σ is assumed to be known .

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.68)

Industry:

Banking & Finance (0.93)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

SINAI at eRisk@CLEF 2022: Approaching Early Detection of Gambling and Eating Disorders with Natural Language Processing

Marmol-Romero, Alba Maria, Jimenez-Zafra, Salud Maria, Plaza-del-Arco, Flor Miriam, Molina-Gonzalez, M. Dolores, Martin-Valdivia, Maria-Teresa, Montejo-Raez, Arturo

arXiv.org Artificial IntelligenceSep-19-2025

This paper describes the participation of the SINAI team in the eRisk@CLEF lab. Specifically, two of the proposed tasks have been addressed: i) Task 1 on the early detection of signs of pathological gambling, and ii) Task 3 on measuring the severity of the signs of eating disorders. The approach presented in Task 1 is based on the use of sentence embeddings from Transformers with features related to volumetry, lexical diversity, complexity metrics, and emotion-related scores, while the approach for Task 3 is based on text similarity estimation using contextualized word embeddings from Transformers. In Task 1, our team has been ranked in second position, with an F1 score of 0.808, out of 41 participant submissions. In Task 3, our team also placed second out of a total of 3 participating teams.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.14806

Country: Europe > Spain (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Knowledge Editing for Multi-Hop Question Answering Using Semantic Analysis

Simon, Dominic, Ewetz, Rickard

arXiv.org Artificial IntelligenceAug-5-2025

Large Language Models (LLMs) require lightweight avenues of updating stored information that has fallen out of date. Knowledge Editing (KE) approaches have been successful in updating model knowledge for simple factual queries but struggle with handling tasks that require compositional reasoning such as multi-hop question answering (MQA). We observe that existing knowledge editors leverage decompositional techniques that result in illogical reasoning processes. In this paper, we propose a knowledge editor for MQA based on semantic analysis called CHECK. Our framework is based on insights from an analogy between compilers and reasoning using LLMs. Similar to how source code is first compiled before being executed, we propose to semantically analyze reasoning chains before executing the chains to answer questions. Reasoning chains with semantic errors are revised to ensure consistency through logic optimization and re-prompting the LLM model at a higher temperature. We evaluate the effectiveness of CHECK against five state-of-the-art frameworks on four datasets and achieve an average 22.8% improved MQA accuracy.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2508.00914

Country:

Europe (0.67)
North America > United States (0.46)
North America > Mexico (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

On Self-improving Token Embeddings

Kubek, Mario M., Pokharel, Shiraj, Böhme, Thomas, McDaniel, Emma L., Unger, Herwig, Mikler, Armin R.

arXiv.org Artificial IntelligenceApr-22-2025

This article introduces a novel and fast method for refining pre-trained static word or, more generally, token embeddings. By incorporating the embeddings of neighboring tokens in text corpora, it continuously updates the representation of each token, including those without pre-assigned embeddings. This approach effectively addresses the out-of-vocabulary problem, too. Operating independently of large language models and shallow neural networks, it enables versatile applications such as corpus exploration, conceptual search, and word sense disambiguation. The method is designed to enhance token representations within topically homogeneous corpora, where the vocabulary is restricted to a specific domain, resulting in more meaningful embeddings compared to general-purpose pre-trained vectors. As an example, the methodology is applied to explore storm events and their impacts on infrastructure and communities using narratives from a subset of the NOAA Storm Events database. The article also demonstrates how the approach improves the representation of storm-related terms over time, providing valuable insights into the evolving nature of disaster narratives.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2504.14808

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

WebMap -- Large Language Model-assisted Semantic Link Induction in the Web

Pokharel, Shiraj, Roßrucker, Georg P., Kubek, Mario M.

arXiv.org Artificial IntelligenceApr-15-2025

Carrying out research tasks is only inadequately supported, if not hindered, by current web search engines. This paper therefore proposes functional extensions of WebMap, a semantically induced overlay linking structure on the web to inherently facilitate research activities.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-60433-1_8

2504.08763

Country: North America > United States (0.29)

Genre: Research Report (0.82)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Adaptive Semantic Prompt Caching with VectorQ

Schroeder, Luis Gaspar, Liu, Shu, Cuadron, Alejandro, Zhao, Mark, Krusche, Stephan, Kemper, Alfons, Zaharia, Matei, Gonzalez, Joseph E.

arXiv.org Artificial IntelligenceFeb-5-2025

Semantic prompt caches reduce the latency and cost of large language model (LLM) inference by reusing cached LLM-generated responses for semantically similar prompts. Vector similarity metrics assign a numerical score to quantify the similarity between an embedded prompt and its nearest neighbor in the cache. Existing systems rely on a static threshold to classify whether the similarity score is sufficiently high to result in a cache hit. We show that this one-size-fits-all threshold is insufficient across different prompts. We propose VectorQ, a framework to learn embedding-specific threshold regions that adapt to the complexity and uncertainty of an embedding. Through evaluations on a combination of four diverse datasets, we show that VectorQ consistently outperforms state-of-the-art systems across all static thresholds, achieving up to 12x increases in cache hit rate and error rate reductions up to 92%.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.03771

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

K Nearest Neighbor-Guided Trajectory Similarity Learning

Chang, Yanchuan, Cai, Xu, Jensen, Christian S., Qi, Jianzhong

arXiv.org Artificial IntelligenceJan-31-2025

Trajectory similarity is fundamental to many spatio-temporal data mining applications. Recent studies propose deep learning models to approximate conventional trajectory similarity measures, exploiting their fast inference time once trained. Although efficient inference has been reported, challenges remain in similarity approximation accuracy due to difficulties in trajectory granularity modeling and in exploiting similarity signals in the training data. To fill this gap, we propose TSMini, a highly effective trajectory similarity model with a sub-view modeling mechanism capable of learning multi-granularity trajectory patterns and a k nearest neighbor-based loss that guides TSMini to learn not only absolute similarity values between trajectories but also their relative similarity ranks. Together, these two innovations enable highly accurate trajectory similarity approximation. Experiments show that TSMini can outperform the state-of-the-art models by 22% in accuracy on average when learning trajectory similarity measures.

artificial intelligence, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2502.00285

Country:

Europe > Germany (0.05)
Europe > Portugal > Porto > Porto (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback